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Abstract 


We present a technique that enables the use of finite model 
finding to check the satisfiability of certain formulas whose 
intended models are infinite. Such formulas arise when using 
the language of sets and relations to reason about structured 
values such as algebraic datatypes. The key idea of our tech- 
nique is to identify a natural syntactic class of formulas in 
relational logic for which reasoning about infinite structures 
can be reduced to reasoning about finite structures. As a 
result, when a formula belongs to this class, we can use exist- 
ing finite model finding tools to check whether the formula 
holds in the desired infinite model. 


1 Introduction 


A new kind of analysis has become popular in the last decade 
in which a system is examined by considering all small cases 
within some bound. The rationale is that flaws are revealed 
more readily by this method than by conventional testing: 
exhausting a huge space of small cases works better than 
considering a much smaller suite of cases, even if it includes 
larger ones. 

Model checking is the preeminent example of this ap- 
proach, and bounds the set of reachable states and some- 
times also the length of execution traces. The success of 
model checking in hardware verification has generated great 
interest in applying it to software. Most model checkers, 
though, offer only rudimentary support for data structures, 
so most applications of model checking to software until now 
have focused on control properties, and data has either been 
ignored or abstracted away. 

To handle data structures effectively within this context, 
a reduction to small cases is needed. With such a reduction, 
no special abstractions for data would be needed, and the 
same bounding mechanism used for trace length, for exam- 
ple, could be applied to the size of data structures. 

How should data structures be represented in such an 
analysis? A relational representation is very attractive, be- 
cause it fits both the analyses that are widely used at the low 
level, and the object-oriented view of a program at the high 
level. Symbolic model checkers such as SMV [18] already 
represent the state as a bit vector; the adjacency matrix rep- 
resentation of a relation is therefore easily integrated. In the 
object-oriented view of program state, the heap is a graph, 
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with objects as nodes and fields of objects as edges between 
objects—in other words, a collection of relations, one per 
field. This view now predominates, because it’s simple and 
easily accounts for sharing (a shared object simply being in 
the range of two relations). 

An important question to ask, therefore, is whether this 
relational viewpoint can accommodate a general theory of 
data structures. Can arbitrary structural properties be nat- 
urally expressed and analyzed by the small case approach? 
This question is not only of theoretical interest. It has arisen 
repeatedly amongst advanced users of one tool, the Alloy 
language and its associated analyzer [1], as they have dis- 
covered scenarios in which Alloy’s relational encoding does 
not seem to capture their intuitions about data structures. 

This paper’s aim is to resolve this issue, not only for 
Alloy, but more generally for any tool that relies on small 
case analysis of a relational encoding. This includes not 
only model checkers (such as SMV [18] and NuSMV [6]), but 
also specification analysis tools based on constraints (such 
as ProB [23] and the Bremen USE tool [10]), and indeed 
potentially to any tool that encodes data relationally. 

To frame the problem rigorously, a characterization of 
data structures independent of the relational viewpoint is 
needed. For this purpose, we use the theory of algebraic 
datatypes, which corresponds to the way most programmers 
think about structured values in datastructures, and is the 
basis for their implementation in many current programming 
languages. 

We start by explaining the standard encoding of alge- 
braic datatypes using relations in first order logic. This en- 
coding is faithful, but it suffers from a major drawback: it 
requires all models of a formula to be infinite. Consequently, 
an analysis based on finite cases cannot be applied. To rem- 
edy this, we remove the logical axiom that is responsible 
for making models infinite. Surprisingly, most analyses per- 
formed in the absence of this axiom are still sound. There 
are, however, analyses that will produce spurious counterex- 
amples. The principal contribution of this paper is a simple 
syntactic criterion that guarantees that a formula being an- 
alyzed will not suffer from this problem. The criterion is 
easy to understand, and could be applied automatically by 
a tool, warning the user when an analysis on the relational 
encoding might produce results that do not correspond to 
the full theory of algebraic datatypes. 

The contributions of the paper are: 


e A recognition of the problem of handling data struc- 
tures in relational encodings, with positive and nega- 


tive illustrations; 


e A rigorous formulation of the problem, in terms of en- 
coding algebraic datatype axioms in first order logic, 
and invariance of formula semantics under the exclusion 
of the ’generator’ axiom that generates infinite models; 


e A simple and effective syntactic criterion characterizing 
the class of formulas for which an analysis involving 
only finite tests is sound and complete. 


2 An Example 


In this section, we motivate the problem with an example 
of a relational encoding of a simple algebraic datatype. We 
show how the omission of a ’generator’ axiom can cause 
spurious counterexamples, but its inclusion results in incon- 
sistency, making the encoding useless. The challenge is to 
determine under what conditions the axiom can be omitted 
while remaining faithful to the theory of algebraic datatypes. 

The example will be given in Alloy [17], a modelling lan- 
guage based on a simple first-order logic with relational op- 
erators. Although our work was motivated by Alloy, our re- 
sults apply more broadly, and the rest of the paper presents 
our theory in a standard logic that has no Alloy-specific fea- 
tures. 

A datatype for lists would be declared in a language such 
as ML [31] like this: 


datatype List = Nil | Cons of Element * List 


where List is the datatype being declared, Element is the 
type corresponding to the elements, and Nil and Cons are 
constructors, with no arguments and two arguments respec- 
tively. 

In Alloy, List and Element are represented as top-level 
sets (called ’signatures’ in Alloy). Nil is a singleton set — 
the set containing the empty list. 


sig Element {} 
sig List {} 
one sig Nil extends List {} 


Cons is represented by a set Cons and two selectors, elt and 
rest: 


sig Cons extends List { 
elt: Element, 
rest: List 


} 


The extends syntax makes Cons a subset of List, disjoint 

from Nil. The selectors are semantically just relations from 

the set Cons to the sets Element and List respectively. 
The function cons can be written as an Alloy predicate 


pred cons (e: Element, 1: List, c: Cons) { 
c.elt = e and c.rest = 1 


} 


which associates with an element e and a list 1 any object 
c in Cons such that e and 1 are the element and rest com- 
ponents of c. 

Now let’s consider checking some putative theorems. 
This assertion says that the element of a list just created 
with an application of cons is the element used in the appli- 
cation: 


assert A { 
all e: Element, 1: List, c: Cons 
cons (e, 1, c) => e = c.elt 


} 


This holds trivially—the consequent being contained in the 
hypothesis—and so, when when checked by the Alloy Ana- 
lyzer, it yields no counterexamples. In contrast, suppose we 
check the assertion 


assert B { 
all e: Element, 1: List, c, c’: Cons | 
(cons (e, 1, c) and cons (e, 1, c’)) => c=c’ 


} 


which claims that cons is deterministic. The Alloy Analyzer 
will give a counterexample such as this: 


List = {LO, Li, L2} 

Cons = {Li, L2} 

Nil = {LO} 

Element = {E0} 

elt = {(L1, EO), (L2, E0O)} 
rest = {(Li, LO), (L2, LO)} 


e = {E0} 
1 = {LO} 
c = {L1} 
c? = {L2} 


in which cons produces the two lists L1 and L2 which are 
structurally identical but nevertheless distinct list objects. 
This might be acceptable for some applications, but if 
we wanted to model the kind of list used in languages such 
as ML, in which equality is structural (and identity of cells 
therefore cannot be observed), we could add an axiom 


fact { 
all 1, 1’: List | 
l.elt = 1’.elt and l.rest = 1’.rest => 1 = 1’ 
} 


requiring that structurally identical lists have the same iden- 
tity, ensuring that the assertion B is now valid. 

Continuing with model exploration, we notice that in 
some cases Alloy generates cyclic lists as counterexamples 
of our properties. Because cyclic lists are not algebraic 
datatypes, we introduce the fact 


fact { 
no 1: List | 1 in 1.*rest 


} 


ensuring that elt is an acyclic relation. 
So far so good. Consider, however, an assertion claiming 
that cons is total: 


assert C { 
all e: Element, 1: List | some c: Cons 
cons (e, 1, c) 


} 


Given our intuition about algebraic datatypes, we expect 
this assertion to be valid. But in the relational setting, given 
the facts stated so far, the assertion C is actually invalid. The 
Alloy Analyzer will generate a counterexample such as: 


List = {LO} 


Cons = {} 

Nil = {LO} 
Element = {E0} 
elt = {} 

rest = {} 

e = {EO} 

1 = {LO} 


The problem, roughly speaking, is that the Analyzer is free 
to construct a counterexample in which there aren’t enough 
lists. 

Suppose, following our previous strategy, we attempt to 
add an axiom to rule out counterexamples of this form: 


fact { 
all 1: List, e: Element | some c: Cons | 
c.elt = e and c.rest = 1 


This ’generator’ axiom ensures that the selector relations 
are complete: for any combination of list and element, it re- 
quires the existence of list for which they are components. 
This will indeed rule out the counterexample above. Unfor- 
tunately, however, it rules out all (nonempty) counterexam- 
ples, even to a manifestly false assertion (for example that 
0 equals 1). The problem is that this axiom has no finite 
models, and is therefore inconsistent in a setting in which 
only finite models are considered. 

The key idea of this paper is that a finite checker can- 
not incorporate this axiom, but must nevertheless be able to 
handle algebraic datatypes. The question then is what class 
of formulas have the same models whether or not this axiom 
is included. The contribution of this paper is a characteri- 
zation of a class of such formulas that is both expressive and 
easily checked syntactically. 

Assertion C, it turns, will not be in this class. The culprit 
is the innermost quantification. The nesting of quantifiers 
is not in itself problematic; rather, the problem is that the 
quantification is not bounded. If instead, it were bounded 
by an expression in terms of variables bound in an outer 
quantifier, no spurious counterexample would be generated, 
even in the absence of the generator axiom. For example, 
the assertion 


assert D { 
all c: Cons | some 1: c.*rest, e: Element | 
cons (e, 1, c) 


} 


saying that each list is the result of an application of cons to 
some sublist, is valid, as expected (the expression c.*rest 
is the application of the reflexive transitive closure of rest 
to c, giving the set of c’s sublists). 


3 Logic and Algebraic Datatypes 


This section introduces our two formalisms: term algebras, 
a theory of algebraic datatypes, and a first-order logic with 
transitive closure. We show how a term algebra can be 
straightforwardly translated into first-order logic, using four 
axioms. One of these is a generator axiom that causes all 
models to be infinite. In the following section, we establish 
our main result about when this axiom can be omitted. 
Throughout the paper, we use binary trees as the canon- 
ical example of algebraic datatypes. Being simple and famil- 
iar, trees serve well as a pedagogical example. In addition, 


S s:= Sy |SisetOpS2|S.R| {x1,...,¢n} 
Ro:u= Ry | RisetOp Re | Ri.Re|A|7~R|*R 
setOp := U|N|\ 
A os= (a1,42)E€R|xeES | 
Si = S2| Ri = Ro | 21 = 22 
Bo := A|BiABo|7Bi 
Fo:= B|Va: sort.F | da: sort.F | Fi A Fo | aFy 
sort ::= Tree | Object 


Figure 1: A Logic with Transitive Closure 


because trees can represent all other algebraic datatypes, 
there is no loss of generality. 


Algebraic datatypes and term algebras. We consider 
structures that contain two kinds of values, or ’sorts’: 1) 
a Tree sort, corresponding to algebraic datatypes, and 2) 
an Object sort, corresponding to all remaining values. In 
a programming language such as ML [31], we would define 
this algebraic datatype using a declaration such as: 


datatype Tree = Nil | Node of Tree * Object * Tree 
datatype Object = Obji | Obj2 | | ObjN 


Note that the datatype Tree has an infinite set of values, 
because there is no bound on the size of a tree. On the 
other hand, we assume that we have already finitized the set 
Object corresponding to the remaining values. If our struc- 
ture had only values of sort Object, we could use existing 
techniques to search for models of formulas (such as [16]); 
our goal, however, is to reason about the structures that 
also contain values of the sort Tree. One view of the results 
of this paper is that we show how to effectively finitize al- 
gebraic datatypes, without making the conclusions derived 
in the finitization meaningless with respect to the intended 
world of arbitrarily large datatypes. 

Algebraic datatypes have proven to be useful not only 
in programming languages, but also in model theory, where 
they correspond to term algebras [26, Chapter 23], [13, Sec- 
tion 1.3]. Term algebras are algebras in which values are 
interpreted syntactically: given a term t without free vari- 
ables (a ground term), the interpretation of t is t itself. The 
term algebras corresponding to the Tree datatype are gen- 
erated by: 


1) a constant Nil of sort Tree, and 


2) a binary constructor Node of type 


Tree x Object x Tree — Tree 


We next present the logic we use to express the properties 
of structures containing terms and objects; we use this logic 
to write the formulas that we wish to analyze. 


A logic with transitive closure. We consider a frag- 
ment of first-order logic with transitive closure. The syntax 
of our logic is in Figure 1: the nonterminal S denotes set- 
valued expressions, R denotes relation-valued expressions, 


[Si setOp S2 
[S.R 


[{2is:2.3¢n} 
[Ri setOp Re 


M=(T,0O,t), a@:Vars—>TUO 
Moa [siy* setOp [S2]* 
Me — {fy | Bx € [S]"*. (a, y) € [RIF 
M& — fo(71),...,0(tn)} 


M,a = [Ri] setOp [Re] 


[Ri-Re M,a _ [Ril * ° [Ro]"* 
={(c, 2) | 3y. (c,y) € [Ra] A 
Wen 
A]’:* ={(2,2)|x2eET 
[4 ss So M,a =((Si)""* =. [S2}**) 
[Ri = Ro Mo — (TR = [Ro]”'*) 
[x1 = x2]"* =(a(x1) = a(x2)) 
[(v1, 22) € RJ * =(a(a1), a(x2)) € FRI” 
[x € RJ* =a(2) € [S]”* 
[CR] * ={ (20, an) | Sn > 1-4ai,...,¢n-1 € T. 
Ajai (@i-1, 4) € [RIF 
[*R M,a = [A]* U [~ Ry 
[Ai \ Ao Mie Aye A A] 
[-4 M,a =-[4]"* 
[Va :: Tree. FI * =Wt ET. [FIM , a! =a := 1] 


[Va :: Object.F 


[Sa :: Tree.F 


[Sa :: Object.F 


Ma Yo €O. [FI , a! = afr := 9] 


M,a 


teT. |F Mo! ! = alr := t] 


Me 3960. (FIM, a! = alr := 0] 


Figure 2: Semantics for Logic of Figure 1 


A denotes atomic formulas, B denotes quantifier-free formu- 
las, and F denotes general (potentially quantified) formulas. 
The non-terminal Sy denotes sets (corresponding to one- 
place predicates), whereas the non-terminal Sp denotes bi- 
nary relations (corresponding to two-place predicates). No- 
tation ~r denotes the irreflexive transitive closure of the 
binary relation r, whereas *r denotes the reflexive transitive 
closure of r. (Among the expressions that we intentionally 
omit are the universal set, and relation inverse. These con- 
structs make it difficult to ensure certain locality properties 
of the expressions that are useful for the formulation of our 
result.) We use the shorthand J'a.F for the formula 


dae.F (x) A (Va,y. F(z) N\F(y) >a =y). 

We interpret formulas in our logic over two-sorted struc- 
tures M = (T,O,+) where T is the domain of the sort Tree, O 
is the domain of the sort Object, and v with domain Sy U Ry 
interprets the built-in sets and relations of the structure M, 
so that u(s) C T or u(s) C O if s is a built-in set, and 
ur) C T?, u(r) C OP, u(r) CT x O, or u(r) COx T if r is 
a built-in relation. The standard model-theoretic semantics 
of our logic is in Figure 2. The function a : Vars ~ TUO 
is a valuation that maps each variable to its value. If y is a 
sentence (a formula with no free variables) then [y]*”"* does 
not depend on a, so when y is a sentence and [y] Mo — true, 
we say that M is a model of the sentence y. A structure M 
is a model of a set of sentences iff M is a model of each of 
the sentences in the set. 

Note that, although we use set-theoretic notation such 
as x € S and (x,y) € R, our logic does not allow quantifica- 
tion over sets, and is no stronger than first-order logic with 
transitive closure. Recall, however, that first-order logic is 
very expressive. Indeed, first-order logic has been used as a 
foundation for set theory and all of mathematics [30]. Note 
also that the axioms and definitions used to represent many 
mathematical problems in first-order logic are often made 
under the assumption that the logic is interpreted over in- 
finite structures. We view the results of this paper as a 
contribution towards reasoning about infinite structures us- 
ing techniques that have proven to be effective for finite 
structures. 


A language for term algebras. The language in Fig- 
ure 1 presents a general logic over arbitrary sets and rela- 
tions. We next turn to the question of choosing the sets and 
relations that are appropriate for describing term algebras. 

Term algebras can be described using constructor rela- 
tions, such as Node, or using selector relations, which are 
the inverse of the constructors [13, Section 2.6]. For our 
purpose, it is more convenient to use selectors. Because we 
consider a binary tree, we use the selectors left, content, and 
right, where left and right denote the children of a node in the 
tree, and content denotes the Object value associated with 
a tree node. We represent selectors as binary relations that 
are partial functions defined on non-Nil terms. We define 
the relation node as the following shorthand: 


node 


(t, t1, 0, t2) (t, t1) € left A (t, 0) € content A 


(t,t2) € right A t 4 Nil 


We also use the subterm relation, defined using transitive 


closure: 
ef 


subterm = ~(left U right) 


The term model. We are interested in checking the 
satisfiability of formulas over the term algebra structure 
Mr = (Tr, O, tr) given as follows: 


e Tr is the set of ground terms generated by constants 
Nil and Node; in other words, T is the least set such 
that 


1. Nil € Tr, and 
2. ifti,t2 € Tr ando € O, then Node(t1, 0, t2) € Tr. 


e O isa finite set; 
e wz is defined as follows: 
ir (Nil) = Nil 
r(left) = {(Node(t1, 0, t2), t1) | ti,te € Tr,o € O} 


( 
( 
( 
( 


c 


ir (content) = {(Node(t1, 0, t2), 0) | ti,t2 € Tr,o € O} 
ir (right) = {(Node(ti, 0, tz), t2) | ti, t2 € Tr,o € O} 


Figure 3 sketches one part of the structure Mr. 


Axioms for term algebras. We adopt the following ax- 
ioms to describe the properties of of term algebras: 


e Selectors: The binary relations left, content, and right 
are total functions on the non-Nil elements of the sort 
Tree, and are undefined on Nil: 

1. Vets: Tree. t 4 Nil= (S'ty :: Tree. (t,t1) € left) A 
(sto :: Object. (¢, 
(S'ty :: Tree. (t,t1) € right) 
2. Wty :: Tree. Vo :: Object. (Nil, ti) ¢ left A 

(Nil, 0) € content A 
(Nil, t1) € right 


We assume that a simple type system of our two-sorted 
language rules out the application of relations to ele- 
ments of inappropriate sort; for example, if t :: Tree 
and o :: Object, then (t,0) € left is not a well-formed 
formula. 


e Uniqueness: The defined relation node has the proper- 
ties of a partial function: 


Vt, t’,t1, to :: Tree.Vo :: Object. 
(t,t1,0,t2) € node A (t’,ti,0,t2) Eenodest=t’ 


e Generator: The defined relation node has the proper- 
ties of a total function: 


Vti, ta :: Tree.Vo :: Object. St :: Tree. (t,t1, 0, t2) € node 


(This axiom holds in Mr, but we will consider the con- 
sequences of omitting it from the axiomatization.) 


e Acyclicity: A term is never a proper subterm of itself; 
that is, the subterm relation is acyclic: 


Vt :: Tree. (t,t) ¢ subterm 


o) € content) A 


We denote by SUGA the conjunction of the axioms above 
(taking the first letter of the name of each axiom). 

Note that the SUGA axioms have no finite models. 
Namely, although not all models of SUGA are isomorphic, 
they all contain an infinite chain of elements fo, t1,t2,... 
where to = Nil and (ti+1, ti, 0, Nil) € node. These elements 
exist by the Generator axiom; the Acyclicity axiom guaran- 
tees that they are all distinct because they are ordered by 
the subterm relation. 

In first-order logic without transitive closure, term alge- 
bras have a complete axiomatization [26, Chapter 23], [25]: 
there is a set of first-order sentences whose consequences are 
precisely the sentences that are true in the structure Mr 
(this set of sentences is infinite and requires some axiom 
schemas). However, even a complete axiomatization does 
not characterize the models up to isomorphism. For exam- 
ple, our SUGA axioms allow countable models with count- 
ably infinite paths of left and right that never terminate at 
Nil. The completeness of the axiomatization of term algebras 
is not of direct interest to us in any case, because a complete 
axiomatization forces the model to be infinite. Instead, we 
look for subclasses of formulas that can be checked on finite 
structures, and we show the soundness of our technique us- 
ing a model-theoretic approach: we look at the truth value 
of the formulas in the desired term model Mr (as opposed 
to checking whether the formulas are a consequence of an 
axiomatization of Mr as in a proof-theoretic approach). 


4 Finite Satisfiability Result 


This section presents the main results of our paper, which 
enable the checking of properties of algebraic datatypes us- 
ing finite models. The basic idea of our approach is the 
following: to prevent all models from being infinite, we drop 
the Generator axiom. Denote by SUA the conjunction of 
the remaining axioms (Selectors, Uniqueness, Acyclicity). It 
turns out that, among the finite structures, SUA character- 
izes precisely the substructures of the term model that are 
subterm-closed (if ¢ is in the structure, then so is each sub- 
term of t). Having proved this characterization, we identify 
a class of sentences whose validity in a finite model implies 
the validity in the full infinite model Mr. 

We next define the notion of a sub-term closed finite sub- 
structure of Mr, illustrated in Figure 3. Intuitively, a finite 
substructure Mo = (Zo, O,+) of Mr is a structure obtained 
from Mr by selecting a finite set To of terms and preserving 
all the relations between the terms in Jp. A structure is 
subterm-closed if a subtree of each tree in To is also in To. 
More precisely, we have the following. 

Consider a term model Mr = (Tr, O, tr). A substructure 
of Mr is a structure Mo = (To,O,t0) where To C Tr and 
the relations given by vo are restrictions of the correspond- 
ing relations given by ur, that is, zo(left) = ur(left) N TE, 
to(right) = er(right) MTF, and i9(content) = ur(content) 
To x O. (We have for simplicity assumed that substruc- 
tures have the same domain O of values of the sort Object.) 
A subterm-closed finite substructure of Mr is a finite sub- 
structure Mo of Mr whose domain of terms To satisfies the 
property t€ To A (t,ti) € [subterm] “7° =>t, € To for all 
t,t: ET. 

We then have the following completeness theorem that 
explains why the SUA axioms are adequate. This theorem 
allows us to ensure that any model of SUA axioms has pre- 
cisely the properties of a subterm-closed finite substructure 


Figure 3: Sketch of the infinite structure Mr and an exam- 
ple of one of its finite subterm-closed substructures Mo = 
(To, {0}, 40) where To = {Nil, ti, t2} for t: = Node(Nil, 0, Nil) 
and tz = Node(t:,Nil). The edges corresponding to the 
content relation are omitted for clarity. 


of the term model Mr. 


Theorem 1 (Axiomatization of Finite Substructures) 
A two-sorted structure M is a model of SUA iff M is iso- 
morphic to some subterm-closed finite substructure Mo of 
Mr. 


The proof of Theorem 1 is in Section 8. 

Having identified that the SUA axioms enforce that a 
finite structure “looks like” a term algebra Mr, we turn to 
the question of when is the case that checking a property 
in a finite model is sufficient to ensure that the property 
holds in the full model Mr. We first look at sentences that 
contain only existential quantifiers. 

An existential sentence is a formula of the form 


WwW 


ti :: Tree. ... At,y :: Tree. w 


where w is quantifier-free and the free variables of w are 
ti,...,tn. The following is a fundamental property of struc- 
tures (dual to the property that if a universal sentence holds 
in a structure then it also holds in its substructure). 


Fact 1 Let M, be a substructure of M and ¢~ a purely ex- 
istential sentence. If py holds in Mi, then ~ also holds in 
M. 


Informally, the property holds because the same existential 
witnesses from the smaller structure can be used in the larger 
structure. 

An important consequence of Theorem 1 and Fact 1 is 
the following: if vy is an existential sentence and the conjunc- 
tion SUA A vy holds in a finite two-sorted model Mj, then 


y holds in some subterm-closed finite substructure Mo of 
Mr, so ¢ holds in the full term model Mr. We thus obtain 
a method to check whether a formula y holds in Mr. In 
the rest of this section, we generalize this result by allow- 
ing arbitrary quantification in y, as long as it is bounded 
by previously introduced values. The intuitive reason why 
this generalization is possible is that we are checking for- 
mulas on subterm-closed structures, which means that the 
bounded quantification has the same semantics in the sub- 
structure Mo and in the full infinite structure Mr. 

Let S be a set-valued term, denoted S in Figure 1, and 
suppose that S does not contain variable t. A bounded uni- 
versal term quantifier 


Vst:: Tree. F 
is a shorthand for the formula 
Vt: Tree. tES > F 


Dually, a bounded existential term quantifier 


dgt:: Tree. 


is a shorthand for the formula 


dt: TreeetES A F 


Hence, bounded quantifiers are expressible in terms of 
the ordinary quantifiers, but are more restrictive. An 
existential—bounded-universal sentence requires each uni- 
versal quantifier to be bounded by some set expression S. 
More precisely, we have the following definition: 


Definition 1 An existential—bounded-universal sentence 
(an EBU sentence) is a formula of the form 


Qivi i: 81. --. QnUn t Sn. W 


where w is a quantifier-free formula (denoted B in Figure 1) 
and each Q;v; :: si is a quantifier or a bounded quantifier of 
one of the following forms: 


e An existential term quantifier Aux :: Tree; 


e A bounded universal term quantifier Vs vy, :: Tree where 
the free variables of the set-valued term S are among 
the previously quantified variables v1,...,Up—13 


e A bounded existential term quantifier 4s up, :: Tree 
where the free variables of the set-valued term S are 
among the previously quantified variables v1,...,UVk—1 
(this quantifier is a special case of the existential term 
quantifier); 


e A universal object quantifier Vu :: Object; 


e An existential object quantifier Su; :: Object. 
We write ~ € EBU to denote that yp is an EBU sentence. 


Note the ways in which EBU sentences generalize purely 
existential sentences: not only is it possible to have arbi- 
trary bounded quantifiers, it is also possible to introduce 
new unrestricted existential quantifiers, even after bounded 
universal quantifiers. 

We are now ready to state our main theorem: 


procedure TermSat 
input: y: an EBU sentence 
output: if y is true in some term model Mr: 
a finite substructure Mo of Mr where holds 
if y is false in all term models Mr: 
no result (infinite loop) 
begin 
keds 
while (true) do 
for each model M; = (T,O,+) with |T|+ |O| =k do 
if (SUA A ¢) holds in M; then return M;; fi 
end 
k:=k+1; 
end 
end TermSat. 


Figure 4: A semidecision procedure for checking satisfiabil- 
ity of EBU sentences in the term model Mr. 


Theorem 2 (Finite Satisfiability Theorem) Let y be 
an EBU sentence and Mr a term model. Then vy holds in 
Mr iff it holds in some subterm-closed finite substructure 
Mo of Mr. 


The proof of Theorem 2 is in Section 8. 

The identification of EBU sentences and the proof that 
they can be verified on finite models is the main contribu- 
tion of this paper. In the sequel, we explore some of the 
consequences of this result. 


5 Consequences 


Given the results of the previous section, we can now an- 
swer the question we posed in Section 2. An analysis of an 
Alloy model in which algebraic datatypes are encoded re- 
lationally will yield sound counterexamples so long as the 
formula being checked is in EBU. A syntactic check for 
membership in EBU based on Definition 1 is easy to imple- 
ment. The result extends to bounded model checkers (such 
as NuSMV [6]) whose analysis consists of finding a model of 
a formula. The language of formulas for such model checkers 
could be soundly extended to EBU sentences over algebraic 
datatypes. 


Analysis procedure. ‘The analysis procedure suggested 
by our result is shown in Figure 4: to check whether an EBU 
sentence y holds in Mr, search for increasingly large finite 
models of SUAAy. The procedure captures the spirit of anal- 
yses such as that performed by the Alloy Analyzer [16,17]; 
in practice, the search for models would employ pruning and 
heuristics and would not require an exhaustive enumeration. 
The correctness and completeness of the procedure follows 
from the results of this section and is the main result of this 


paper: 


e we can check the condition that a structure is a 
subterm-closed submodel of Mr by simply conjoining 
axioms SUA to y, thanks to Theorem 1; 


e we know that the existence of the returned finite model 
M, implies that y holds in Mr, thanks to the (<=) 
(soundness) direction of Theorem 2; 


e we know that if y holds in model Mr, then the al- 
gorithm will find a finite model Mj which proves this 


fact, thanks to the (=) (completeness) direction of 
Theorem 2. 


Closure under boolean operations. Having identified 
EBU sentences as a useful class of formulas for which the 
algorithm in Figure 4 is applicable, we next examine the fol- 
lowing question. If yi, 2 € EBU, is there an effectively con- 
structible sentence y € EBU such that the following equiva- 
lences hold in Mr: 


op = (1A 92) 
ey = (¢1V 2) 
ey = (-$91) 


ey = (1 >¢2) 


G 


It turns out that the answer to first two questions is “yes”, 
whereas the answer to the last two questions is “no”. In 
other words, EBU sentences are closed only under positive 
boolean combinations, but are not closed under negation or 
implication. We make this claim precise using the following 
two propositions. 


Proposition 1 Let y; = BQ,.Fi and y2 = BQ,.F> be 
EBU sentences where BQ, and BQ, denote sequences of 
quantifiers and bounded quantifiers and where F,F2 are 
quantifier-free formulas. Let BQ).F3 be the result of renam- 
ing the variables in y2 so that they are all distinct from the 
variables in pi. Then 


yivy2 — BQ,.BQ). Fi A FS 


(1) 
yiVye. — BQ,.BQ). Fi V FS 
Moreover, BQ,.BQ}.Fi A Fy and BQ,.BQ).Fi V F3 are EBU 
sentences. 


The condition (1) follows from the basic monotonicity prop- 
erty of quantifiers and operations A, V. The fact that the 
concatenation of disjoint EBU sequences of quantifiers is 
again an EBU sequence of quantifiers follows from the defi- 
nition of EBU sentences. 

We next turn to the absence of the closure under negation 
and implication. We first note that the entire class of EBU 
sentences is undecidable. 


Fact 2 The problem of determining, given an EBU sentence 
y, whether y holds in Mr, is undecidable. 


Fact 2 follows from the fact that the subterm relation is 
definable in our logic, and from the undecidability result 
in [39, Section 4], which shows a reduction from the Post cor- 
respondence problem to the satisfiability of sentences with 
existential quantifiers and bounded universal quantifiers. 

Fact 2 has two main consequences for this paper. The 
first consequence is that, from the viewpoint of computabil- 
ity, the semidecision procedure in Figure 4 is as good as 
we can hope for. The second consequence is the absence 
of closure under negation and implication, as given by the 
following proposition. 


Proposition 2 There is no effective algorithm that, given 
a sentence y € EBU, constructs a sentence in EBU equiv- 
alent to =p. Consequently, there is no such algorithm that 
constructs a sentence equivalent to yp => false. 


The following is an indirect argument for Proposition 2: sup- 
pose that there is such an algorithm. Consider any EBU sen- 
tence y. Let ¢@ be the sentence computed by the algorithm, 
so that ¢ is equivalent to y. Then either y or ¢ holds in Mr, 
so if we run two copies of the procedure in Figure 4 in par- 
allel, one with the input y and the other one with the input 
~, then one of the algorithms will eventually terminate and 
we will conclude that y is either true or false. This implies 
that the class of EBU formulas is decidable, contradicting 
Fact 2. 


Arbitrary algebraic datatypes. We have presented 
our result for binary trees, but it applies to all algebraic 
datatypes, and, more generally, to any structured data. In- 
deed, it is easy in relational logic to reason about records 
and tuples that have an a priory bounded number of com- 
ponents: just introduce a new variable for each component. 
What the results of this paper imply is that we can also 
reason about structures, such as binary trees, that do not 
have an a priory bound on their size. It is not difficult 
to generalize the proofs of Theorem 1 and Theorem 2 to 
the case of any finite number of mutually recursive alge- 
braic datatypes. Alternatively, we can encode any number of 
datatypes using binary trees. (Indeed, the experience with 
programming languages such as LISP [28] is convincing ev- 
idence that data structures can be represented using LISP- 
like lists, which are binary trees.) The idea of representing 
algebraic datatypes with trees is to replace each constructor 
application C;,(ti,...,tn,01,-.-,;0m) with an expression 


Node( fx, 00, 
Node(t1, 01, Node(t2, 02,...Node(tp, op, Nil) ...))) 


where f; is a finite tree (of size O(log k)) that encodes the 
name of the constructor Cz, where P = max(n,m), t; = Nil 
if i > n, and 0; = oo if i > m. Here oo € O is some arbi- 
trary fixed object from the set of uninterpreted objects. The 
corresponding selector relations are similarly definable using 
quantifier-free formulas in terms of selectors left and right, 
and so is the subterm relation. Note that, when reasoning 
about arbitrary algebraic datatypes, we are interested not in 
all possible trees, but only in the substructure of Mr which 
is the image of the embedding. In other words, we would 
like to ensure that the binary trees that represent the values 
of variables in formulas are consistent with the type system 
of the original algebraic datatypes. Luckily, this condition 
is expressible using our logic with transitive closure. There- 
fore, it suffices to restrict all quantified variables to the terms 
that satisfy this condition, and the resulting formula can be 
checked using the algorithm in Figure 4. 


The scope of our result. After realizing that our tech- 
nique applies to algebraic datatypes, a natural question to 
ask is: does the technique fundamentally depend on the 
properties of the structure of algebraic datatypes, such as 
the uniqueness of left and right relations (as given by the 
Selectors axiom), the uniqueness of the parent relation node 
(as given by the Uniqueness axiom), or even the acyclic- 
ity (given by the Acyclicity axiom)? When examining this 
question it is worthwhile to consider two separate questions: 


e How do we generalize the notion of subterm-bounded 
substructures of Mr to the case of substructures of 
some other infinite structure M. of interest? (The 
generalization of Theorem 2.) 


Suppose that we are interested in checking constraints 
over an infinite structure M. with relation symbols 
T1,---,Tn-. It turns out that the only essential require- 
ment on the structure Mq is that, for some term vari- 
able t, the set [{t}.*(r1 U...Urn)]“~® is finite for 
each valuation a. In other words, as long as the set of 
elements “below” each element of M. is finite, we can 
use bounded quantification to reduce the truth value of 
EBU sentences in M. to the satisfiability in finite sub- 
structures closed under the “below” relation. In par- 
ticular, the technique applies to structures that contain 
shared elements and cycles. 


e How do we axiomatize a class of finite structures of 
interest? (The generalization of Theorem 1.) 


From an algorithmic point of view, this question ad- 
mits a wider spectrum of solutions than just the use of 
axioms in first-order logic with transitive closure (al- 
though the use of axioms may have an advantage in 
the context of constraint-solving tools). Indeed, given 
a family of finite structures of interest (in particular, 
given a family of finite subterm-closed substructures of 
M..) we can use any language of computable functions 
to define an executable test predicate that determines 
whether a finite structure is isomorphic to one of the 
finite structures of interest. In other words, we can use 
an algorithm specialized for a given problem to filter 
the finite structures of interest. This idea of using “ex- 
ecutable predicates” appears in the form of run-time as- 
sertions in many programming languages and has found 
applications in software testing [5,27]. 


Because of these generalizations, we expect our result to be 
applicable to a range of infinite structures. 


6 Related Work 


Constraint-checking tools. Because of its full automa- 
tion, model checking approaches based on finitization of the 
problem space are very attractive. These approaches have 
had great success for control-intensive problems [6,18] such 
as those arising in hardware verification. The complexity of 
software systems often comes from the data structures that 
they manipulate, and notations such as UML [35] have been 
used to describe such constraints. The Alloy notation [16,17] 
can also be used to describe such constraints; the Alloy An- 
alyzer tool [1] can then search for the structures that satisfy 
these constraints. Our experience in using the Alloy no- 
tation and the analyzer to reason about structured values 
was the immediate inspiration for this paper. Because it es- 
tablishes a general correspondence between satisfiability in 
finite and infinite models, our result is potentially applicable 
not only to Alloy, but also to tools such as MACE [29], Para- 
dox [7], USE [10], ProB [23], RACER [40], and FaCT [14]. 


Algebraic datatypes. Our paper uses algebraic 
datatypes as a well-studied example of unbounded struc- 
tured values. Algebraic datatypes are the basis of the 
algebraic approach to formal specification and _ verifica- 
tion [2,4,11,12]. The use of the list algebraic datatype was 
pioneered by LISP [28]. User-defined algebraic datatypes go 
back to ML [31] and are used in variants such as Haskell [3] 
and Objective Caml [22]. 


Term algebras without transitive closure. The first- 
order theory of term algebras is decidable [25, 26, 37]. Be- 
cause the interpretation of Object is a finite set, omitting 
the transitive closure from our logic makes formulas decid- 
able even with arbitrary (not only bounded) quantifiers. 
The complexity of the resulting decision problem is non- 
elementary [8,9] with the height of the tower of exponen- 
tials linear in the number of quantifier alternations in the 
formula [42]. More tractable classes of term algebras in- 
clude the class of quantifier-free formulas [33]. Several de- 
cidable extensions of term algebras have been proved decid- 
able [20, 21, 36, 38,41], mostly using quantifier elimination 
techniques. 


Term algebras with a subterm relation. Adding a 
subterm relation to the first-order language of term algebras 
makes the problem substantially more difficult. Indeed, [39] 
shows that even the satisfiability of formulas with bounded 
universal quantifiers is undecidable (although the satisfiabil- 
ity of the purely existential fragment with a subterm relation 
is still decidable). As we noted in Section 5, the undecidabil- 
ity result for algebras with subterms applies to our logic as 
well, because the subterm relation is expressible using tran- 
sitive closure. A search for counterexamples is useful even 
for an undecidable logic (and is, in fact, at least as important 
as the search for counterexamples in decidable logics), and 
the results of this paper show how to perform such search 
for a useful class of formulas. 


First-order logic with transitive closure. First-order 
logic with transitive closure is useful for reasoning about 
program data structures and has been used not only in Al- 
loy [17], but also in shape analysis tools such as TVLA [24] 
and PALE [32]. Among the decidable fragments with transi- 
tive closure are monadic second-order logic [19,34] and some 
subclasses of the existential monadic second-order logic of 
graphs [15]. 


Complexity and bounded quantification. In their 
study of lower and upper bounds on the complexity of logi- 
cal theories, Ferrante and Rackoff [9, Page 30] describe the 
notion of H-bounded structures for some function H, which 
enables a reduction of general quantifiers to bounded quan- 
tifiers. The existence of appropriate such function implies 
the decidability of the structure, so such H does not ex- 
ists for term algebras with subterm relation. Our use of 
bounded quantification is different: we have syntactically 
imposed boundedness of universal quantifiers and showed 
that it implies the ability to use finite structures to reason 
about certain classes of formulas in infinite structures. 


7 Conclusions 


The language of sets and relations has proven to be a very 
powerful notation for modelling a range of structures aris- 
ing in software design and analysis. Model finding tools 
have made this approach accessible and practical. So far, 
model finding tools have been restricted to arbitrarily large, 
but finite models. However, some useful structures are in- 
herently infinite, in particular the algebraic datatypes such 
as lists and trees. Such structures are widely used in im- 
plementations and models of software, but when we try to 
apply existing tools to these structures, we are faced, in gen- 
eral, with either ruling out all models (which is sound, but 


entirely useless), or allowing the possibility that the tool re- 
turns unsound, meaningless models that do not apply to the 
desired infinite structures. 

In this paper we have presented a useful and natural class 
of formulas for which the existence of a finite model reveals 
the satisfiability of the formula in the infinite structure. For 
this class of properties, we have proved that it is possible to 
partially axiomatize the desired structure in such a way that 
finite models are simply substructures of the desired infinite 
structure. In this way, concrete feedback from model finding 
tools can be brought to a range of ubiquitous data structures 
that would otherwise remain out of their scope. 
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8 Proofs of Theorems 


Theorem 1. 


A two-sorted structure M is a model of SUA 


iff M is isomorphic to some subterm-closed finite substruc- 
ture Mo of Mr. 
Proof. We prove both directions of the equivalence. 


<=): 


Suppose that a structure M is isomorphic to a 


subterm-closed model Mo of Mr. Then M™ satisfies same 


formulas as Mo. 


Therefore, it suffices to verify that Mo 


satisfies the SUA axioms Selectors, Uniqueness, Acyclicity. 
Axioms Uniqueness and Acyclicity hold in Mr, so they hold 
in Mo as well: indeed, a relation that has two values in 


a substructure Mo also has two values in the larger struc- 
ture Mr, and a cycle in Mp is also a cycle in Mr. Axiom 
Selectors holds because Mo is subterm-closed: the compo- 
nents of every non-Nil term t in Jo are also in To. 

==>): Suppose that a finite two-sorted structure M = 
(TO, +) satisfies SUA axioms. We identify a subterm-closed 
finite structure Mo = (To,O,t0) isomorphic to M by es- 
tablishing a relation f C T x Tr and showing that g is an 
isomorphism where g = f U Ao and Ao is the identity re- 
lation on O. We define f using the following least fixpoint 
construction. Let fo = {(e(Nil), Nil)} and let 


fita = fi U { tv" Node(t1, 0, t2)) € f | 
t’,t) € u(left), (t’, t2) € e(right), 
t’,0) € e(content), 


( 
( 
( 
(ti, t1), (ta, t2) € fi} 
Then define f = Uso fi. In other words, we map u(Nil) to 
Nil and we extend the relation by following parent relation 
in both M and Mp. 

We next define a measure on the elements of structures 
M and Mo. Consider first an element ¢ € T of structure M 
and consider any sequence of elements to,ti1,... such that 
to = t and (ti, titi) € e(left) Ue(right). Because M satisfies 
Acyclicity and T is finite, the sequence is finite. Moreover, 
because of the axiom Selectors, the sequence terminates at 
the element i(Nil). For each element ¢, let d(t) be the max- 
imum of the lengths of all such sequences. We correspond- 
ingly define d(t) for t € Tr of the structure Mo. 

We then prove by induction on 7 the conjunction of the 
following properties: 


Pl) dom(fi) = {t! € T | d(t’) <i} 


P2) each relation gj = f; U Ao is a partial isomorphism, 
that is, that g; is an isomorphism between structures 
induced by the domain of g; (denoted dom(g;)) and the 
range of g; (denoted ran(g;)); 


Base case. go is trivially a partial isomorphism because 
dom(r) = c(Nil) and ran(r) = Nil = co(Nil), so P1 holds. 
Moreover, from Selectors it follows that if d(t) = 0 then 
u(t) = Nil, so P1 and P2 also hold. 

Inductive step. Suppose that g; satisfies Pl and P2; we 
show that gi+1 satisfies these properties as well. 


e Pl. Let ¢’ € dom(fi+1). By definition of fi4i, there 
exist t1,t € dom(fi) such that (t’,t,) © c(left) and 
(t’,t) € v(right). By inductive hypothesis, d(t,) < 7 
and d(ts) < i. By axiom Selectors, there are no ele- 
ments x of T other than ¢), tg such that (t,x) € e(left) U 
u(right). Therefore, d(t) < 1+max(d(t4),d(t,)) <i+1. 
Conversely, let t’ € T be such that d(t’) < i+ 1. 
If d(t’) < i then t’ € dom(fi) C dom(fi+1), so let 
d(t') =i+1. Then t’ 4 Nil so by Selectors there exist 
unique elements t, tg € T and o € O such that (t’, t1) € 
u(left), (t’,t2) © e(right) and (t’,o) © c(content). Be- 
cause 1 + max(d(t;),d(t,)) = d(t) < i+ 1, we have 
d(t,) < i and d(t) < i. By induction hypothesis, 
ti, ts € dom(f;). Again by induction hypothesis, f; is a 
partial isomorphism, so there exist terms ti,te € Tr 
such that (t{,t1) € fi and (t4,t2) € fi. By defi- 
nition of fii we have (t’, Node(ti,0,t2)) € fi+1, so 
te dom(fi+1). 
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e P2, functionality. For t’ € dom(fi) the property fol- 
lows by inductive hypothesis, because f;+1 does not add 
new values to elements that are already in f;. So let 
(t’,t), (t',8) © fiai \ fi. Then for both (¢’, t) and (t’, t) 
there are elements t,t € T such that (t’,t,) € v(left) 
and (t’,t3) € v(right), and by Selectors these elements 
are unique. Moreover, by induction hypothesis, f; is 
functional, so t,t are related via f; to unique elements 
ti,t2 € Tr. Therefore, Node(t1, 0, t2) is the unique ele- 
ment ¢ such that (¢’,t) € fi41. 


e P2, injectivity. By definition of fi+1, there is exactly 
one element t with the property (t, Nil) € fi+1, namely 
(Nil). Hence, injectivity can be violated only on non- 
Nil terms. Consider t = Node(t1,0,t2) € ran(fizi). A 
tuple (¢’,t) is in fi+1 only if there are some t,t) € T 
such that (¢4,t1),(t4,te) € fi, (U,t1) © c(left), and 
(t’, ts) € (right). By induction hypothesis, such t1, t% 
are unique because f; is a partial isomorphism. Fi- 
nally, by Uniqueness, there is at most one t’ such that 
(t’,t,) € o(left) and (t’,t2) € e(right), so t’ is unique. 


e P2, Nil preservation. Clearly ((Nil), Nil) € fo © fi41, 
so interpretation of Nil is mapped to the interpretation 
of Nil. 


e P2, left preservation. Let (t’,t),(t1,t1) € fiti. We 
show that (t’,t,) € c(left) iff (¢,t1) € vo(left). If t’,t, € 
dom(f;), the property holds by induction hypothesis, 
so suppose that t’ ¢ dom(f;) or t, ¢ dom(f;). 


Suppose first (t’, t1) € e(left). Then t’ € dom(fi+1 \ fi) 
so by definition of fi+1 there exists o € O, t, € T and 
t1,t2 € Tr such that (t’,t1) € c(left), (1,6) € fi, and 
(t’, Node(ti, 0, t2) € fizi. By axiom Selectors, t, = ti, 
so (ti,t1) € fi. We have shown above that fj+1 is 
functional, so t; = t1. Furthermore, t = Node(t1, 0, t2). 
By definition of ur, (t,t1) € er(left), as desired. 


Conversely, suppose that (t,t1) € er(left). By defini- 
tion of ur, this means there are o € O and tz € Tr such 
that t = Node(t1, 0, t2). By definition of f;+1, there are 
t,t, € T such that (t,t) © fisi, (4,t1) © fizi and 
(t’,t1) © v(left). We have shown that fi+1 is injective, 
so t’ =t' and t, = t). Hence, we have (t’,t,) € v(left), 
as desired. 


e P2, right preservation. Analogous to the previous case. 


This completes the inductive step. Given that properties 
hold for all 2, let n = max{d(t’) | t' € T}. Then dom(fn) = 
T and gn+1 = gn; 80 g = gn. Let Mo = (To,O,10) where 
To = ran(fn) and zo(content) = {(t,o) | at’ € T.(t’,0) € 
u(content)}. Then g is a bijection TUO — ToUO, it preserves 
left and right because f, does, and it preserves content by 
construction. Therefore, Mo is the desired model and g is 
the desired isomorphism. 


Theorem 2. Let y be an EBU sentence and Mr a term 
model. Then vy holds in Mr iff it holds in some subterm- 
closed finite substructure Mo of Mr. 
Proof. We prove both directions of the equivalence. 

<=) : Suppose that y holds in a subterm-closed finite 
substructure Mo = (To,O,t0) of Mr = (Tr,O,er). When 
evaluating y in Mr, for any witness for an existential quan- 
tifier we can pick the same witness in Mr as in Mo, because 
To C Tr. Moreover, regardless whether they are interpreted 


in Mo or Mr, the universal quantifiers range only over el- 
ements of To, so they still hold in Mo. We next make this 
argument more precise. 

Observe the following properties 
relation-valued terms in our language, 
M and every valuation a: 


of set-valued and 
for every structure 


e if R is a relation-valued expression, then 
[RY C [x (left U right)]”"* 


e if S is a set-valued term with free variables x1,... 
on which a is defined, then 
[Ss] C [f{an,...,an}-#(left U right)]”* (3) 
These properties follow by induction on the size of the ex- 
pressions R and S. 
Note also that Mo is a substructure of Mr, so by induc- 
tion on size of R and S' we have 


[RIM = [RIM 0 Te (4) 
[s]”° = [s]"" 073 

In this inductive proof, the interesting case is showing (after 
applying the induction hypothesis) 


([Ri] "7 9 T5) 0 [R27 9 To) = ([RaJ"* © [R2o]"7) 0 TS 


The C inclusion holds by definition of the relation compo- 
sition o, whereas the D inclusion follows from (2) and the 
fact that Mo is subterm closed. 

We next show that the truth-value of a quantifier-free 
formula F' is the same in Mop and Mr when the free variables 
of F are interpreted in Jo. We show by induction on the 
structure of formula F' the following claim: 

For all a: Vars — To UO, 

[FIM0* = [FY *. (5) 
Indeed, (5) holds for atomic formulas by condition (4) and 
the assumption that a(x) € To UO. Moreover, this property 
is preserved by propositional combinations, so it holds for 
all boolean combinations. 

Finally, given an EBU sentence y, we prove the following 
relationship for all quantified subformulas F' of y: for all 
a: Vars > To UO, [F]“°* implies [F]”“7"* = true. 

The base case corresponds to the previously proved case 
of quantifier-free formulas. We show that the condition is 
preserved under existential quantifiers, bounded universal 
quantifiers, and quantifiers over the finite set O. So suppose 
that [F]'“°* implies [F]“7 °° for all a : Vars > To UO and 
suppose that a: Vars > Ty UO and [Fi]”*. 


e Let Fy = Au; :: Tree. F. Then there exists t € To such 
that [F]“°°* where a = a’[u: := t]. By induction 
hypothesis [FIM = true, so [Fi] 7" = true. 


e Let Fi = Vs ur :: Tree. F for some set expression S. 
Then [F]“0o-clv'=4 — true for each t € [S]”0-*. From 
(3), (4), a : Vars — To UO, and the fact that Mo is 
subterm-closed, we conclude [S]”“7°* = [S]“°* C Tp. 
Consider arbitrary t € [S]“7°. Then t € [S$], 
so [F]“o-clv'=l = true. Because t € To, by induc- 
tion hypothesis [FJ“7:*’'=] = true. This proves 
[F.J“2'* = true. 
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e The cases Fy = Avo :: Object. F and Fy = Vuo :: 
Object. F are straightforward because the quantifiers 
are monotonic and the structures Mr and Mo have the 
same domain of uninterpreted objects O. 


This completes the proof of one direction of our statement. 
Note that we have not relied on the fact that Mr is full 
term model. In fact, this direction still holds for Mo and My 
where Mp is a substructure of M/, and M, is a substructure 
of Mr: if the EBU sentence holds in Mo, then it also holds in 
the larger substructure M1. We will use this generalization 
in the proof of the converse direction. 

==>) : Let y be EBU sentence. We prove by induction 
that for all subformulas F’ of y the following holds: for each 
a: Vars + Tr, if [F]“7* = true, then there exists a fi- 
nite subterm-closed model Mo and a valuation ap such that 
ao(xi) = a(a;) for each variable x; free in F’, such that 
[F]“°-°° = true. The proof of this claim is by induction on 
the number of quantifiers in F’. 

For the base case, assume that F’ is quantifier-free, 
and let 71,...,%n be the variables of F’. Then let Jo = 
[{(x1,---,an}.*(left U right)]”“7°* and let Mo be the sub- 
structure of Mr induced by To. Let ao(ai) = a(xi) for 
1<i<nand let ao(v) = a(x1) for v ¢ {71,...,2n}. Then 
ao : Vars + Ty UO, so by (5) we have [F]“°°%°; we have 
thus identified the desired Mo and ao. 

For the inductive step, assume that claim holds for for- 
mula F’, we prove that it holds for F\ which is the result of 
quantifying F. Suppose that [Fi]“7"* holds. We consider 
several cases. 


e Fi, = Ay, :: Tree. Ff. Then there exists t € Tr such 
that [F]@"°’ where a’ = alu := t]. By induction 
hypothesis, there exists ao that agrees with a’ on free 
variables of F' and a finite subterm-closed model Mo 
such that [F]”°°°. This means that [F,]"“°°°, and 
ao certainly agrees with a on the variables free in F}. 


e Fi =Vsu%; =: Tree. F. Let S = [s]“r-*. Assume first 
S #@. Then for each t € S, if a(t) = alu; := t], then 
[FJ“7:°©, so by induction hypothesis there exists a 
model Mo(t) = (To(t), O, eo(t)) and a valuation ao(t) 
such that [F]“0*0 , and ao(t) agrees with a(t) on 
the free variables of F’. Then let 


T =SU\|J Tot) 


tes 


Let Tp be the subterm closure of Tj, given by Ty = To U 
{t | at’ € Tj. (t,t) € [subterm]”7°*}. The union Tj is 
finite because S is finite, and each T(t) is finite. There- 
fore, the subterm closure To is finite, so there exists a 
finite subterm-closed structure Mo = (To,O,00). By 
the generalized version of the (==) direction, because 
Mo(t) is a substructure of Mo, we have that [F]@o7o 
for each t € S. Because we have [S]“°°* = S we con- 
clude [Fierce where ti € S is arbitrary. 


Next, consider the special case S = @. Let 


To = [{a1,...,@n}.*(left U right)] “7%, 
where %1,...,2%n are the free variables of S, and con- 
sider the corresponding model Mo = (To, O,t0). Then 


[s]¥or = 6, so [Ayo 


e Fy = Avo :: Object. F. This case is analogous to the 


case Fy = dv; :: Tree. F’. 


FL = VWvo :: Object.F. This case is similar to 
the case Fy = Vsvuz_ :: Tree.F’, but slightly sim- 
pler. For each o € O, if a(o) = alvo := oj, then 
[F]“7:°, so by induction hypothesis there exists a 
model Mo(o0) = (To(0), O, to(0)) and a valuation ao(o) 
such that [F000 and ao(o) agrees with a(o) on 
free variables of F’. Then let 


To = LU To(o) 


0€O 


The union Tp is finite because S$ is finite, and each 
To(t) is finite. To is also subterm closed because each 
To(o) is subterm-closed. Therefore, there exists a finite 
subterm-closed structure Mo = (To, O, 0). By the gen- 
eralized version of the (==>) direction, because Mo(o) 
is a substructure of Mo, we have that [F]'0:°° for 
each o € O. We conclude [Fierce ton) where 0, € O 
is arbitrary. 
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